Product Centric Web Page Segmentation and Localization
نویسندگان
چکیده
The Internet is home to an ever increasing array of goods and services available to the general consumer. These products are often discovered through search engines whose focus is on document retrieval rather than product procurement. The demand for details of specific products as opposed to just documents containing such information has resulted in an influx of product collection databases, deal aggregation services, mobile apps, twitter feeds and other just-in-time methods for rapid finding, indexing, and notifying shoppers to sale events. This has led to our development of intelligent Web crawler technology aimed towards this specific category of information retrieval. In this paper, we demonstrate our solution for Web page categorization, segmentation and localization for identifying Web pages with shopping deals and automatically extracting specifics from the identified Web pages. Our work is supported with empirical data of its effectiveness. A screencast demonstration is also available online at http://youtu.be/HHPme6AJuCk.
منابع مشابه
Persian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملAnalysis and Modeling of Evolving Database-centric Web Applications
Database-centric web applications tend to evolve over time. However, there are no comprehensive tools to analyze and present the synopsis of changes for such applications. In this paper, we address the problem of analyzing an evolving application and presenting the synopsis of changes, which can be recursively drilled down in an interactive manner. Specifically, we analyze two versions of an ap...
متن کاملA Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations
In this paper, we describe a Web page segmentation method based on title blocks and show its evaluation. Title blocks are minimum blocks that function as headlines for specific Web content. A typical Web page consists of multiple elements with different types of features, such as main content, navigation panels, copyright and privacy notices, and advertisements. Web page segmentation is the div...
متن کاملClient-Side Centric Model for Generating One-Page Modern Web Applications Dealing with Databases
Nowadays modern Web applications provide desktop-application-like flexible user experiences without using explicit requests. Modern Web applications need complex components: sever-side logic programs, server-side communication programs, output Web pages, client-side logic programs, and client-side communication programs. We present a new generation model called client-side centric model. This m...
متن کاملA Model for Web Page Usage Mining Based on Segmentation
The web page usage mining plays a vital role in enriching the page’s content and structure based on the feedbacks received from the user’s interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focu...
متن کامل